Journal of Bioinformatics and Systems Biology
○ Fortune Journals
Preprints posted in the last 90 days, ranked by how well they match Journal of Bioinformatics and Systems Biology's content profile, based on 14 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Deng, F.; Li, H.; Sun, D.; Duan, G.; Sun, Z.; Xue, G.
Show abstract
High level of protein expression is usually welcomed in industry and research, and codon optimization is widely used to achieve high expression. Methods of implementing codon optimization can be divided into two branches, one is classical methods which develop cost functions based on empirical law, another is AI methods which learn the codon choice principles from endogenous genes with neural networks. Here we develop two codon optimization tools based on two branches respectively, namely OptimWiz 2.1 and OptimWiz 3.0. Results of fusion protein fluorescence detection indicate that both OptimWiz 2.1 and OptimWiz 3.0 are superior to all the other commercially available codon optimization tools. Principles of codon optimization are revealed in the process of machine learning on both tools.
West, C.; Dineen, L.; LaBella, A. L.
Show abstract
Transfer RNAs (tRNAs) are known for delivering amino acids to the growing polypeptide chain during translation. They can also influence gene expression, especially in times of nutrient starvation, through differential tRNA expression and modification. tRNAs have a highly consistent cloverleaf structure, but relatively few known regulatory elements govern this conserved structure despite the 20 different standard isotypes. This study examines gene enrichment patterns near tRNA in 1154 fungal genomes. Genes enriched in proteasome regulation, ion transport, and rRNA were found to be significantly closer to tRNAs than other pathways. These results were consistent across KEGG over-representation analysis (ORA), KEGG Gene Set Enrichment Analysis (GSEA), and Gene Ontology (GO) analysis. Proteasome, ion transport, and RNA are all important aspects of protein production and regulation, suggesting that genes required for the synthesis and quality control of proteins, including tRNAs, are located near each other. Protein regulation is an energetically expensive process, and local co-regulation could increase efficiency and stress impacts on proteins.
May, G. E.; Akirtava, C.; McManus, J.
Show abstract
Since the discovery of viral Internal Ribosome Entry Sites (IRESes), researchers have sought to find similar elements in mammalian host genes, termed "cellular IRESes". However, the plasmid systems used to measure cellular IRES activity are vulnerable to false positives due to promoter activity in candidate IRESes. Orthogonal methods are needed to validate putative IRESes while carefully avoiding artifacts known to cause false positives. Recently, Koch et al. proposed approaches for studying IRESes, primarily circular RNA-generating plasmids, and for validating mRNA transcripts using smFISH and qRT-PCR. Here, we demonstrate confounding variables and artifacts in each of these approaches that can lead to inappropriate conclusions about potential cellular IRES activity. We show the back-splicing circRNA plasmid creates linear mRNA artifacts associated with false-positive IRES signals. Using orthogonal, gold-standard assays validated with viral IRESes, we find putative cellular IRESes reported using the back-splicing plasmid have no IRES activity. Furthermore, we demonstrate that smFISH and qRT-PCR can misidentify nuclear non-coding RNAs as mRNAs and we validate a single molecule sequencing assay for identifying genuine mRNA 5 ends. Our work establishes reliable methods for robust transcript annotation and IRES studies that avoid documented artifacts arising from bicistronic and back-splicing circRNA plasmid reporters.
Sindhi, N. A.; Pawar, N.; Dixson, J.; Garcia, D.
Show abstract
Predicting protein-protein interactions is a fundamental problem in molecular biology. Experimental approaches for identifying protein-protein interactions are time-consuming and labor-intensive, motivating the development of efficient computational alternatives, including machine learning-based methods. However, conventional machine learning methods often rely on manually engineered features that require substantial domain expertise. In this study, we propose a two-stage framework to address these limitations. In the first stage, a one-dimensional convolutional neural network autoencoder is used to automatically learn latent representations from protein sequences. The quality of these features is evaluated through reconstruction error, reflecting how accurately the model reconstructs the original sequence. In the second stage, these learned features are combined with amino acid frequency-based features to form a hybrid feature set for predicting protein-protein interactions. A systematic comparison is performed between models trained on frequency features alone and those using a hybrid representation. The comparison showed that incorporating one-dimensional convolutional neural network-derived latent features improved the models performance of predicting protein-protein interactions. The dataset was split into training, validation, and test sets. Nested cross-validation was employed, with inner loops for hyperparameter tuning and outer loops for model selection. The random forest classifier achieved the best performance, with a mean receiver operating characteristic-area under curve of 0.91 and a test F1-score of 0.87. These results highlight the effectiveness of integrating deep feature learning with ensemble methods for predicting protein-protein interactions and build upon previous work focused on this fundamental problem. Author SummaryProtein-protein interactions are fundamental in all biological processes. However, predicting these interactions is a key problem in molecular biology. Computational approaches have been tested to address this problem. We applied a mix of machine learning and deep learning to gain insight into the qualities of proteins that engage in interaction. First, we trained a deep learning model, which automatically learned the primary sequence and characters related thereto, reducing bias in the actual prediction process. We combined these features, or latent representations, with amino acid frequency features of protein sequences, and called the two together "hybrid features." Then we performed a systematic comparison of amino acid frequency features-only with hybrid features, among four different machine learning classifiers. Our results suggest that the random forest classifier performed best among all four classifiers at predicting interactions between proteins. We propose that this approach could be used to improve efficiency in testing protein-protein interactions at the bench and may have applications to other biologically relevant molecular interactions.
Shahid, S.; Lundin, D.; Rozman Grinberg, I.; Sjöberg, B.-M.
Show abstract
The prevalent transcriptional repressor NrdR binds to highly conserved prokaryotic sequences in the promoter regions of operons encoding the essential enzyme ribonucleotide reductase. The NrdR binding sites consist of two partially palindromic 16 bp sequences (NrdR boxes) separated by a 15-16 bp linker sequence. We have assessed the requirement of both boxes for binding, the propensity of different NrdRs to bind to heterologous binding sites, and that the linker sequence is only limited to length and not sequence conservation. As we have observed several deviations from the conserved sequences of the NrdR boxes, we here test the conservation requirements of individual basepairs in the NrdR boxes using a synthetic DNA fragment (Synt DNA) to which the NrdR proteins from the actinomycete Streptomyces coelicolor and the gammaproteobacterium Escherichia coli bind equally well as to their homologous binding sites. By introducing isolated mutations to Synt DNA and testing the binding capacity of NrdR from S. coelicolor and E. coli we expand our understanding of what criteria are needed to build a functional binding site for the NrdR repressor.
GAYRAUD, G.; Davila Felipe, M.; Padiolleau-Lefevre, S.; Maffucci, I.; Issouani, E. M.; Guerin, M.; Da Ponte, H.
Show abstract
Aptamers are single stranded DNA or RNA molecules selected for their high affinity and specificity to bind target molecules, similar to antibodies. They are commonly selected through the SELEX process, which involves the iterative exposure of a random sequence library to a target and retaining the sequences showing good binding properties. To improve Lyme disease detection, we propose designing aptamers that specifically bind to the CspZ protein on the surface of Borrelia burgdorferi, the bacterium responsible for the disease. Starting with a SELEX process consisting of thirteen rounds, from which selected in vitro sequence candidates have emerged, we aim to propose a holistic process that selects in silico new sequence candidates that are further validated experimentally. Our approach relies on 1) using Machine Learning (ML) techniques, specifically a Restricted Boltzmann Machine (RBM), to digitally replicate the last round of the SELEX process, 2) integrating insights from text analysis methods, such as word2vec and n-grams, into the RBM model trained on the final-round SELEX dataset to represent and compare newly generated sequences with in vitro candidates, 3) selecting in silico sequences with strong potential to bind to CspZ protein, 4) experimentally validating the selected in silico sequences of step 3. Our holistic approach combines biological insights with statistical models to improve the efficiency and outcome of the SELEX process. We enhance the RBM model, designed to replicate the distribution of the final SELEX round, by integrating geometric representations of sequences, which is especially advantageous when dealing with limited datasets relative to the vast sequence space. In addition, it provides in silico sequence candidates with strong binding properties.
Parida, A. S.; Kumar, A.; Tiwari, B.
Show abstract
The only autonomously active transposable elements in the human genome are Long interspersed nuclear element-1 (LINE-1) elements. These elements are known to play an important role in changing the transcriptome. LINE-1 sequences affect gene regulation during post-transcription processing, along with their established role in retrotransposition. Exonization is one mechanism where the LINE-1 integrated genome undergoes alternative splicing to produce new isoforms of transcripts. Our work mainly highlights the effect of LINE-1 associated exonization, focusing on the formation of isoforms of transcripts. Using Non-small cell lung cancer (NSCLC) as a model, we conducted a detailed transcriptome study that combines splice junction profiling with gene expression data. Our results show that LINE-1 sequences are often included as exons in host transcripts, leading to the formation of new exons and their various isoforms. The events are validated by solid splice junction evidence that proves the reliability and reproducibility. In particular, it was observed that repetitive analyses revealed certain LINE-1 exonization events that were consistent. The finding indicates that LINE-1 act as recurrent sources of splice ready sequences. Though exonizations do not necessarily affect the total expression levels of genes, our study reveals that they certainly contribute to transcript diversity. The diversity of isoforms generated potentially contributes to the effects of gene function. This study is limited to NSCLC, but it is likely that the exonizations events play a crucial role in the altering RNA diversity in cancers. Therefore the study elucidates new insights into how transposable elements modify gene structure and function during cancer development.
Siddiqi, M. A.; Kumar, H.; Mazumder, M.
Show abstract
Influenza A virus (IAV) causes significant morbidity and mortality worldwide. Understanding how viral RNAs may regulate host genes through microRNA-like mechanisms can clarify pathogenesis and reveal therapeutic targets. In this study, we screened all eight IAV H3N2 RNA segments (PB2, PB1, PA, HA, NP, NA, M, and NS) using an ab initio computational pipeline; five segments (PB2, PB1, PA, HA, and M) met the VMir scoring threshold for further analysis, while NP, NA, and NS were excluded due to low pre-miRNA scores. Mature miRNAs were identified using MatureBayes, and target genes in the human genome were predicted with the miRDB server. From these targets, we selected two genes per qualifying segment (10 genes total) based on their functional relevance to influenza infection and supporting literature; all selected genes are unique to their respective segment. We identified 10 segment-specific target genes (IFNL1, DDX60, SAMHD1, MAVS, IRF4, BIRC2, AGO1, MAP3K1, NOD1, and TNFAIP1) and one common target across all five analyzed segments (CADM2). Gene Ontology and pathway analyses showed enrichment in interferon signaling, RIG-I-like receptor pathways, antiviral restriction, RNA interference, and inflammatory responses. Literature supports roles for these genes in pulmonary and antiviral innate immunity. Our findings provide a basis for experimental validation and may help the research community better understand influenza virus pathogenesis and identify novel therapeutic candidates. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/725090v1_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@2b14adorg.highwire.dtl.DTLVardef@5a9b2eorg.highwire.dtl.DTLVardef@81ffc1org.highwire.dtl.DTLVardef@be119b_HPS_FORMAT_FIGEXP M_FIG C_FIG
Messmer, M.; de Carpentier, F.; Lam, E.; Hong, M.; Wakao, S.; Schroda, M.; Niyogi, K. K.
Show abstract
Chlamydomonas reinhardtii is a model green alga extensively used to study photosynthesis and cilia using molecular biology and genetics. Electroporation is a very common technique to transform DNA into the nuclear genome, which is essential to generate mutant collections and express transgenes. Here, we describe a simple, fast, and efficient protocol to transform strains with an intact cell wall. It achieves a good transformation efficiency without cell wall digestion or use of commercial kits and is compatible with the widely available Gene Pulser electroporation system. Key featuresO_LIHigh transformation efficiency of Chlamydomonas reinhardtii strains with an intact cell wall. C_LIO_LIFaster than currently available electroporation protocols. C_LI
Watcharapalakorn, A.; Poyomtip, T.; Tawonkasiwattanakun, P.; Dewi, P. K. K.; Thomrongsuwannakij, T.; Mahawan, T.
Show abstract
PurposeTo determine whether circadian timing defines critical molecular windows in myopia development and to assess the transferability of circadian gene programs across ocular tissues, disease stages, and species. MethodsPublicly available retinal and choroidal RNA-seq datasets from chick models of form-deprivation myopia were analyzed using unsupervised transcriptomic profiling and multistage machine-learning classification. Circadian windows were defined based on Zeitgeber time, and samples were grouped accordingly for downstream analyses. Classification model robustness was evaluated through cross-tissue and cross-stage validation and further assessed using external validation in an independent dataset. Functional translation to humans was examined using ortholog-based Gene Ontology enrichment analysis to identify conserved biological processes and higher-order regulatory pathways. ResultsA circadian critical window at ZT8-ZT12 exhibited the strongest transcriptional divergence during both myopia onset and progression. Gene signatures derived from this window generalized across retina and choroid and remained predictive across disease stages, supporting coordinated molecular regulation between ocular tissues. External validation confirmed the reproducibility of these signatures despite differences in experimental design and gene coverage. Functional mapping revealed that conserved molecular components in chicks are reorganized into more complex neuroendocrine and regulatory networks in humans, indicating cross-species conservation with increased functional complexity. ConclusionsCircadian timing strongly shapes myopia-related gene expression and underlies coordinated retina-choroid signaling. These findings highlight circadian biology as a key factor of refractive development and suggest that time-dependent mechanisms may influence myopia susceptibility, progression, and response to treatment.
Dongardive, V.; Jathar, S.; Srivastava, J.; Tripathi, V.
Show abstract
The cell cycle comprises different phases and is a tightly regulated process at the molecular level. During the cell cycle, two key events occurred: DNA duplication during the S phase and chromosome segregation during mitosis. Accurate cell cycle progression, achieved through faithful chromosome segregation, is essential for maintaining cell fidelity. Long noncoding RNAs are a subclass of noncoding RNA that are longer than 200 bp and form RNA protein complexes (RNPs) to regulate various biological processes. Herein, we demonstrate that lncRNA NORM is involved in regulating the cell cycle by maintaining proper chromosome segregation. NORM exhibited G2 phase-specific expression, and the depletion of NORM resulted in a significant G2/M arrest. NORM-depleted cells failed to progress in mitosis and showed defects in chromosome segregation. We further demonstrated that NORM binds to proteins such as Plk1 and Nsun2. Depletion of NORM hindered the interaction between Plk1 and Bub1, resulting in reduced kinetochore localization of Plk1 during prometaphase. Our results also show that the depletion of NORM affects the binding of Nsun2 protein to CDK1 mRNA and, consequently, the stabilization of CDK1 at the protein level. Altogether, our results demonstrate that NORM regulates chromosome segregation by mediating the interaction between Plk1 and Bub1.
Tokmakov, A. A.
Show abstract
Xenopus is a genus of entirely aquatic frogs found in sub-Saharan Africa. Currently, the complete genomes of two species within the Xenopus genus, Xenopus laevis and Xenopus tropicalis, have been fully sequenced, annotated, and made publicly available. The two species inhabit markedly different environments: X. tropicalis lives in the hot, equatorial regions of Africa, whereas X. laevis resides in the cooler climates of southern Africa. In the present study, mutational profiling, comparative homology modeling, and computational bioinformatics were used to identify the features of adaptive evolution in Xenopus endonuclease G (EndoG) proteins. The multiple characteristics of EndoG isozymes were discovered to vary considerably between the two Xenopus species dwelling in different locations. Most notably, EndoG proteins from the psychrophilic X. laevis exhibit the increased contents of charged and polar residues, elevated pI, higher intramolecular interaction energies, B factors, molecular void volumes, and solvent accessibilities, but the decreased contents of nonpolar and aromatic amino acids, lower hydrophobicity, buried surface area, and molecular packing density compared to those from the thermophilic X. tropicalis. The observed differences strongly suggest that temperature plays a dominant role in EndoG diversification. Evaluation of intramolecular interaction energies appears to be a particularly sensitive and discriminative framework for assessing protein divergence at the structural level. Overall, this study highlights the diversification of homologous proteins in ectothermic vertebrate eukaryotes and provides mechanistic insight into protein adaptation to contrasting environments.
C A, A.; Upadhayay, R.; Patankar, S. A.
Show abstract
Toxoplasma gondii is a widespread human pathogen that has multiple, clinically relevant stages in its complex life cycle, including fast-replicating tachyzoites and latent bradyzoites. Bradyzoite differentiation is triggered by stress responses that lead to changes in transcription, translation, and metabolism. Two aspects of this process are addressed in this report: first, whether proteins that play roles in bradyzoite differentiation are specific to T. gondii and other bradyzoite-forming parasites of the Sarcocystidae family, and second, whether new bradyzoite differentiation proteins can be identified in T. gondii. To answer these questions, a phylogenetic approach was used, comparing proteomes of select members of the Sarcocystidae family that form morphologically different bradyzoite cysts and members of the Eimeriidae family that do not form cysts. This approach resulted in 8 distinct clusters of T. gondii proteins that reflected different conservation patterns; for example, one cluster showed conservation among all organisms, while another showed conservation in bradyzoite cyst-forming organisms. Known T. gondii proteins involved in bradyzoite differentiation were found in all clusters, indicating that this process uses both highly conserved pathways as well as bradyzoite-specific pathways. Importantly, the cluster containing proteins that are conserved in bradyzoite-forming organisms contained several known regulators of bradyzoites, and will be a source for identifying novel T. gondii proteins that are involved in bradyzoite differentiation.
Kim, H.; Cheong, K.; Jeon, J.; Choi, G.; Koh, J.; Song, H.; Hue, Y.; Nam, Y.; Choi, B.; Lim, Y.-J.; Choi, J.; Kim, K.-T.; Lee, Y.-H.
Show abstract
Magnaporthe oryzae, the rice blast fungus, plays a role as a model organism for molecular plant-microbe interaction research. Studies on the pathogenic mechanism of this fungus revealed many genes involved in signaling pathways. As multi-omics data are being available, genomic-level researches have been conducted to uncover the underlying biological processes during the pathogenesis of M. oryzae. Identifying the genome-wide protein-protein interaction (PPI) network is one of the omics-level approaches, which helps to understand signaling and regulatory pathways. However, existing biological network resources of M. oryzae are not sufficient to decipher pathogenesis mechanisms due to the abundance of false positives/negatives. In this study, a reliable PPI network database of M. oryzae, MagNet, was constructed with three methods, including homology-based Interolog search, co-expression network construction, and domain-domain interaction (DDI)-based prediction. With three approaches altogether, the pan-network with 5,600,976 interactions was generated, including 217,531 highly confident interactions supported by all three methods. Experimental data on M. oryzae PPIs supported that our PPI network can predict PPIs with higher accuracy compared to the previously constructed databases. MagNet would provide integrated biological network data, which can help to understand the molecular mechanisms of the rice blast fungus. The PPI data can be accessed via https:/magnet.scnu.ac.kr.
Casajuana, B.; Casals-Franch, R.; Lopez Garcia de Lomana, A.; Marti-Puig, P.; Villa-Freixa, J.
Show abstract
Parameter estimation in nonlinear biological dynamical systems is a difficult inverse problem because the governing equations are often stiff or oscillatory, the data are sparse and noisy, and the objective landscape is non-convex. Physics-informed neural networks (PINNs) offer an alternative to purely simulation-based calibration by representing state trajectories with neural networks while penalizing violations of the governing equations. This paper studies the empirical reliability of PINNs for recovering the parameters of the repressilator, a synthetic genetic oscillator formed by three cyclically repressive genes. We use synthetic time-series generated from the standard ordinary differential equation model and train inverse PINNs to estimate the production parameter {beta} and the Hill coefficient n. The study varies observation noise, partial observation of repressors, sampling density, sensitivity to initial parameter guesses, and the difference between stable and oscillatory regimes. The results show that PINNs can reconstruct trajectories accurately when the model structure is correct and the three repressors are observed, but parameter recovery is more fragile than trajectory fitting. Noise, sparse sampling, unobserved variables, and unfavorable initial guesses increase the risk of biased estimates. The stable regime is easier to reconstruct, whereas the oscillatory regime provides richer information but also exposes optimization sensitivity. These findings support PINNs as a useful reverse-engineering tool for small gene-regulatory ODE models, while highlighting the need for repeated runs, uncertainty reporting, and experimental designs that improve identifiability.
Shen, J.; Tang, S.; Xia, Y.; Qin, J.; Xu, H.; Tan, Z.
Show abstract
BackgroundConventional models of human ribosomal DNA (rDNA) array organization have historically depended on transcription-centric boundaries, partitioning the unit into a [~]13 kb rDNA transcription region and a monolithic [~]31 kb intergenic spacer (IGS). While our previous identification of Duplication Segment Units (DSUs) mapped these arrays based on an intuitive analysis of the microsatellite density landscape of the complete reference human genome, our present deep mining of this landscape has revealed a more accurate rDNA Gene Unit Pattern. Methods & ResultsIn this study, we conducted a deep mining analysis of our previously established microsatellite density landscape of the T2T-CHM13 assembly, focusing specifically on nucleolar organizing regions (NORs). We suggest a more accurate rDNA Gene Unit Pattern containing a (CTTT)n microsatellite aggregation ahead of the rDNA gene and a (CT)n microsatellite aggregation behind the gene, rather than a pattern featuring an IGS region inserted between two rDNA genes. ConclusionsA correct rDNA gene pattern of the human genome probably includes a (CTTT)n microsatellite aggregation ahead of the gene and a (CT)n microsatellite aggregation behind it, which possibly constitute cis- and trans-regulating regions; the (CTTT)n and (CT)n microsatellite aggregations may provide two different local stable DNA structures for regulatory protein binding.
Popinga, A. N.; Forman, J.; Svetlov, D.; Vo, H. D.; Munsky, B. E.
Show abstract
Biological data is prone to both intrinsic and extrinsic noise and variability between experimental replicas. That same stochasticity and heterogeneity can carry information about underlying biochemical mechanisms but, if not incorporated in modeling and probabilistic inference, can also bias parameter estimates and misguide predictions and, subsequently, experiment design. Mechanistic inference typically requires lengthy simulations (e.g., the Stochastic Simulation Algorithm (SSA)); approximations to chemical master equation (CME) solutions that lack rigorous error tracking; or deterministic averaging that lacks the complexity necessary to reflect the data. We introduce the Stochastic System Identification Toolkit (SSIT) - a fast, flexible, and open-source software package available on GitHub that makes use of MATLABs efficient and diverse computational architecture. The SSIT is designed for building, simulating, and solving chemical reaction models using ODEs, moments, SSA, Finite State Projection truncations of the CME, or hybrid methods; sensitivity analysis and Fisher information quantification; parameter fitting using likelihood- or Bayesian-based methods; handling of experimental noise and measurement errors using probabilistic distortion operators; and sequential experiment design that empowers users to save time and resources while gaining the most information possible out of their data. The SSIT also offers advanced modeling tools, including model reduction methods for increased efficiency and joint fitting of models and datasets with overlapping reactions or parameters. To facilitate the ease and speed of use, the SSIT provides a graphical user interface and ready-made, adaptable pipelines that can be run in the background from commandline or high-performance computing clusters. We demonstrate features of the SSIT on two experimental datasets: the first consists of published mRNA count data that reflect Saccharomyces cerevisiae yeast cell response to osmotic shock using single-cell single-molecule fluorescence in situ hybridization; the second consists of single-cell RNA sequencing measurements of 151 activating genes in breast cancer cells following treatment with dexamethasone. Author summaryWe present the Stochastic System Identification Toolkit (SSIT) to model, fit, and predict any data that can be interpreted as changing populations or counts through time, including but not limited to single-cell experiments, economics, epidemiology, ecology, sociology, agriculture, and biotechnology. The SSIT was constructed particularly for stochastic modeling, which is important for systems whose states may experience significant fluctuations from mean behavior, thus affecting the inference of the underlying rate parameters and predictions of subsequent behavior. The SSIT provides statistical inference tools for parameter estimation; sensitivity analysis and information calculation; handling of distortions to probability distributions caused by experimental or measurement processes (e.g., dropout in single-cell RNA sequence data and total fluorescence intensities versus spot counting/puncta analysis); and quantitative design of experiments. The SSIT also offers a variety of complex modeling tools, including model reduction methods and fitting of combined models/datasets that share some behavior but remain distinct (e.g., different genes responding a single stimulus). The SSIT generates pipelines for easy, efficient analyses to run in the MATLAB environment, in the background on commandline, or on high-performance computing clusters, thus facilitating users to make informed, time- and cost-effective decisions about their next set of experiments.
Trypsteen, W.; Vynck, M.; Untergrasser, A.; Whale, A. S.; Rodiger, S.; Dobnik, D.; Bogozalec Kosir, A.; Milavec, M.; Kubista, M.; Pfaffl, M. W.; Nour, A. A.; Young-Kyung, B.; Bustin, S. A.; Calin, G.; Chen, Y.; Cleveland, M. H.; De Falco, A.; Forootan, A.; O'Sullivan, D. M.; Devonshire, A. S.; Foy, C. A.; Fraley, S. I.; Gleerup, D. G.; He, H.-J.; Hellemans, J.; Lievens, A.; Lind, G. E.; Porco, D.; Romsos, E. L.; Thas, O.; Drandi, D.; de Tayrac, M.; Taly, V.; Huggett, J. F.; Vandesompele, J.; De Spiegelaere, W.
Show abstract
Digital PCR (dPCR) is a powerful technology for absolute quantification of nucleic acids, valued for its accuracy, sensitivity, and repeatability. Yet, the commercialization of different instruments with proprietary software has introduced challenges to data analysis, interoperability, and comparability. Therefore, we present the Digital PCR Data Essentials Standard (DDES) - a lightweight, human- and machine-readable, and cross-platform data standard developed in collaboration with the dPCR community. The standard consists of three file types designed to enable both manual inspection and automated analysis: (i) a main file summarizing experiment and reaction-level (meta-)data; (ii) an assay file describing targets and detection chemistry, and (iii) intensity files capturing partition-level raw fluorescence data per reaction. DDES supports a wide range of current dPCR applications, including singleplex and multiplex assays, endpoint and real-time readouts, and will be curated to implement future dPCR developments. By harmonizing the data structure, DDES lays out the foundation for FAIR dPCR data practices and supports improved software compatibility, collaborative and reproducible research, and future dPCR data repositories.
Hashimoto, S.; Yamada, K.; Izawa, T.
Show abstract
Even in the era of genomics and pan-genomics, the gene remains the fundamental unit of heredity. Accurate visualization of gene structures provides essential insights into the organization, regulation, and evolution of each gene. Representing elements such as exons, introns, untranslated regions, and functional domains in a clear and interpretable format is particularly important for analyzing complex gene architectures and for communicating results effectively. Despite the availability of several visualization tools, many are limited in their ability to incorporate user-defined annotations or to support interactive and customizable figure generation, which reduces their utility in modern analytical workflows. To address these limitations, we developed geneSTRUCTURE, both command-line-interface and web-based application designed to provide flexible and user-friendly visualization of gene structures based on widely used annotation formats, GFF3 and GTF. In addition to visualizing core gene components, the platform allows users to overlay supplemental annotations, including mutation sites and protein domains, and to adjust layout features in real time. By combining a modern interface with annotation flexibility and high-resolution output, geneSTRUCTURE offers a robust solution for gene-level visualization.
Grinstead, S.; Nemchinov, L. G.
Show abstract
We recently reported the identification of endogenous viral elements (EVEs) originating from the Caulimoviridae family within the alfalfa (Medicago sativa L.) genome. Our subsequent identification of ubiquitous rhabdoviral elements in infected and healthy alfalfa tissues by high throughput sequencing prompted us to suggest that the alfalfa genome might be populated with integrated rhabdoviruses as well. Bioinformatics analysis using 26 publicly available alfalfa genomes proved the suggestion accurate. We found multiple non-retroviral segments of the Rhabdoviridae family belonging to the genera Betanucleorhabdovirus and Betacytorhabdovirus that appeared to be stable constituents of the host genome. In that capacity they could potentially acquire functional roles in alfalfas development and response to environmental stresses. We believe this study reveals the first documented case of rhabdoviruses integrated into the alfalfa genome.